A Chinese LPCFG Parser with Hybrid Character Information

نویسندگان

  • Wenzhi Xu
  • Chaobo Sun
  • Caixia Yuan
چکیده

We present a new probabilistic model based on the lexical PCFG model, which can easily utilize the Chinese character information to solve the lexical information sparseness in lexical PCFG model. We discuss in particular some important features that can improve the parsing performance, and describe the strategy of modifying original label structure to reduce the label ambiguities. Final experiment demonstrates that the character information and label modification improve the parsing performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Maximum Entropy Chinese Character-Based Parser

The paper presents a maximum entropy Chinese character-based parser trained on the Chinese Treebank (“CTB” henceforth). Word-based parse trees in CTB are first converted into characterbased trees, where word-level part-ofspeech (POS) tags become constituent labels and character-level tags are derived from word-level POS tags. A maximum entropy parser is then trained on the character-based corpu...

متن کامل

A Hybrid Chinese Information Retrieval Model

A distinctive feature of Chinese test is that a Chinese document is a sequence of Chinese with no space or boundary between Chinese words. This feature makes Chinese information retrieval more difficult since a retrieved document which contains the query term as a sequence of Chinese characters may not be really relevant to the query since the query term (as a sequence Chinese characters) may n...

متن کامل

Chinese Parsing in a Phoneme-to-Character Conversion System Based on Semantic Pattern Matching

We have recently developed a Chinese phoneme-to-character conversion system with a conversion rate close to 96%. The underlying algorithm, called the context sensitive method, is based on "semantic pattern matching". The construction of these semantic patterns is largely based on linguistic common sense and corpus statistics. An interesting finding is that this method is well suited for many ot...

متن کامل

Pragmatic Chinese Lexical Analysis Based on Word-character Hybrid Model

In the field of information and natural language processing, Chinese lexical analysis is important basic step for Chinese, Japanese or other asian language. This paper presents Chinese lexical analysis integrating word-level and character-level information based on hybrid model combining word-based CRF model and latent semi-CRF model. The word-lattice, which represents all candidate outputs, is...

متن کامل

Statistics Based Hybrid Approach To Chinese Base Phrase Identification

This paper extends the base noun phrase(BNP) identification into a research on Chinese base phrase identification. After briefly introducing some basic concepts on Chinese base phrase, this paper presents a statistics based hybrid model for identifying 7 types of Chinese base phrases in view. Experiments show the efficiency of the proposed method in simplifying sentence structure. Significance ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010